\ 



DOCUMENT RESUME 



ED 208* 061 

TITLE-* ' 
INSTITUTION 
PUB 0ATE 
NOTE 

EDRS PRICE. 
DESCRIPTORS 



IDENTIFIERS' 



Til 810 832 



Accountability Testing Handbook. • 

Montgomery County -Public Schools, -Bbckville, Md. 

Aug 80 , 

»y P . • ; ' . 

..MF01/PC04 Plus Postage! 

Definitions; Elementary secondary Education; 
Objectives; Scores; * Standardized Tests; *Test 
Format; *Testing; *Test Interpretation 
♦California Achievement Tests; Montgomery County 
Public Schools MD; Test Reporting 



\ 



ABSTRACT • > 

The purpose of this handbook is to acquaint 
principals and teachers **with the California Achievement Tests, 
mandated by the Maryland state Department of Education- Reports of 
the test results are also discussed- The first" chapter describes the 
test and provides examples of question formats.. A tabl« of the 
objectives measured is also included. The second, chapter presents 
reports that are distributed to the schools, and an explanation of 
the data on the reports with suggestions. , for their use. Also included 
are the School Frequency Distributions, Mean score Report, Percent 
^Correct by Objective, and the Individual- Tes,t Report'.' Commonly used- 
technical testing terms are defined in the final chapter.- 
(Author/GK) 



N 



* .' Reproductions supplied by EDRS are the best that can be made * • 

* * ' from" the original document. , '* 



9 

ERJC 



MONTGOMERY COUNTY 
PUBLIC SCHOOLS 



ACCOUNTABILITY 
TESTING- 
HANDBOOK 



DEPARTMENT OF EDUCATION 

NATIONAL INSTITUTE OF EDUCATION " 
EDUCATIONAL RESOURCES INFORMATION 

CENTER (ERIC) , 
X This document has been reproduced as 
. received from the person or organ.wt.on 

ongmatmg n 
' 1 MmoV changes have been <nade to rmprove 
reproduction quality ■ 



^ • Points of ytew or opinions stated inth*docu- 
, ment do not necessary represent off.cdl NlE> 
position or policy 



August 1980 



''PERMISSION TO REPRODUCE TjHIS, 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



\ 



( 

J* 



. , ACCOUNTABILITY 

% TESTING ' & 



HANDBOOK . 



Department of Educational Accountability 
MONTGOMERY COUNTY PUBLIC SCHOOLS . 
Rockvill'e, Maryland 



TABLE OF CONTENTS 

* r , Page 

Introduction / , j 

Chapiter 1: California Achievement Tests ...->- 3 

1A: Description of California Achievement Tests 7 

IB. Skills Measured J ' \\ 

IC. Question Formats 15 

Chapter 2:, Reporting Test Results . . . . . ; . ♦ 33 

2A. School Frequency Distribution and Mean "Score Report. 37 

2B, 'Reporting Data *From Longitudinal Analyses ...... 43 

2C, Percent Correct By Objective 51 

2D. Individual Test Report.' . . . . , ♦ . . 57 

2E. Prominent Guidelines for Interpreting Test Data . . » 63 

Chapter 3t Technical Testing Term? ,65 



iii 



■ 4 



INTRODUCTION 

The purp9s'e of this handbook is to acquaint principals and teachers with 
the California Achievement Tests, 'the standardized test mandated by the 
Maryland State Department of Education. Reports of the results from the 
California Tests are also discussed in order to assist staff i/i 



interpreting them accurately and completely. * \ 

' .. \ > ' • ■ 

Chapter 1 describes 'the test and provides examples^ of question formats 

for the purpose of familiarizing teachers with the types of questions and 

directions used throughout the subtests. Also included in this chapter 
jf , ' 

is a table of the objectives measured by each subtest at levels 13-19. 

Chapter 2 presents reports that are distributed to the schools, ( an 
explanation^ of the data on the reports, ^and suggestions for accurate use 
of these data. The School Frequency Distributions give schools a pleture 
of both their typical achievement and the variation of achievement • ift the 
school. The Mean Score Report summarizes the typical achievement' in each* 
^ftfcftH A discussion of longitudinal and nonlongitudinal analyses 

describes how to overcome some weaknesses of the Mean Score Report. The^ 

ii V I 

Percent Correct by Objective" report provides school, area, and county 

summary information on the objectives measured by each subtest. Finally, 
the ''Individual Test Report 11 provides data on individual students. 



The final t chapter present? the more comflTSplv used technical testing' 
terms with definitions, as well as some important precautions about their 



use.. 

\ - 



\ 



CHAPTER 1 v 



CALIFORNIA ACHIEVEMENT TESTS 



6 



/ 



■ERIC 



* < 



The „ major purpose of' this chapter is to describe the characteristics of 
the California Achievement Tests , to provide, a Ivst of the skills 
measured by each subtest and *to provide examples of question format^ used 
in the Calif ornia subtests, LTevels 13 to 19. The purpose is not to 
^inform teachers of what the test items are. The examples ysed are meant 
to acquaint teachers with the types ojc questions used and alei^t them to 
the need for careful attention to directions.- There is no intent to 
"teach to the test" npr to give last minute training to* students. 
Finally, the examples are % not meant as irfdications of what has *to be 
taught. * *- „• 

• / , . . 



-5- 



7 



' lA^DESCRIPTION OF CALIFORNIA ACHIEVEMENT TESTS 



trie California Achievement Tests replace the towa Tests of Basic Skills., 

^ITBS) and the Tests of Academic Progress (TAP) as the standardized 

1 ' 
.norm-referenced test -used for systemwide testing. There are a few 



basic changes, as well as mar/y similarities, 



There are five major content areas measured on th^ California/Achievement 

Tests, shown in Table 1.1.) They are Reading, Spelling, Language, Math ' 

and Reference Skills. Most ' levels of the California Tests measure the 

same content areas as the ITBS. However, some areas measured on separate 

subtests by the ITBS have been combined inCo *one subtest on the ; 

California Tests. This occurs with Punctuation and Capitalization, 

measured separately on the IT^S, but included. in the same subtest on the ' 

/ * 
California. In addition, the three V ITBS subtests dealing with Reference 

Skills have been combined into one subtest on the California. On the 

other hand, Mathematics Computation, a subtest on the Calif ornia^ jis_.Jiat- 

directly measured 'on the ITBS.' . * 



Norm-referenced is explained in«Ghapter 3. 



ERIC ' 



-7- 



'9 



TABLE 1.1 



COMPAMSON.OF SUBTESTS ON THE CALIFORNIA ACHIEVEMENT* TESTS (Levels 13-19) 
A AND IOWA' TESTS OF BASIC SKILLS* (ITBS) , ' . '• 



CALIFORNIA 



ITBS 



LEVEL ' 
li ONLY 



n PHONIC ANALYSIS » 
STRUCTURAL' ANALYSIS 
READING VOCABULARY 

-t 

•-.Reading comprehension 



VOCABULARY/ 



READING- COMPREHENSION 



ERIC 



[ 



SPELLING 



r LANGUAGE MECHANICS 



■-LANGUAGE EXPRESSION 



SPELLING 



PUNCUTATION ' 
CAPITALIZATION 



LANGUAGE USAGE 



r 



r- MATHEMATICS COMPUTATION 



MATHEMATICS CONCEPTS 
' AND APPLICATIONS 

: \ 



REFERENCE SKILLS 



MATHEMATICS CONCEPTS 
MATHEMATICS PROBLEM SOLVING 



GRAPHS AND TABLES 



, REFERENCE SKILLS 
MAP READING 
GRAPHS AND TABLES 



\ . 



-8- 



s 



Level 13 t of^ the California, which ^ can be used in Grade 3, has\ two 

additional reading sections, Phonic Analysis and Structural Analysis. It 

' "* ' ' * - 

does not have a Reference, Skills section. 



The cdntent covered . in" the California Tests does not match as closely 
with the TAP (given in Grade 11) as it does with the ITBS, 'shown in Table 

1.2*. The California Tests include three content' areas not measured by* 

> 

the TAP. ' These are Reading Vocabulary, Mathematics •Computation, and 
Reference. Skills. • However, Social Studies, Science, and Literature, all 
measured by the TAP, are not covered' in the California Test Battel . 

•Like the ITBS, the California Achievement Tests, include several total 
scores which are combinations of subtest scores. Total Reading is a 

• .-<f . 

combination of the .following ^subtests : Phonic Analysis (Level 13 only), 
Structural Analysis, (L'evel i3 only), Reading Vocabulary and Reading 

i * 

Comprehension. Total Language is matie up of the Language Mechanics and 
Language Expression subtests. Spellirfg is not inclined .in the Language 
Total. There is also a* Total Mathematics " score that is composed of the 
Mathematics Computation *and ^t he Mathematics Concepts and Applications 
subtests. Finally, the Total Battery score % is a combination, of all of v 
the above subtests. The Reference Skills subtest (Levels 14 to 19) is 
not included in the Total Battery score. " 



-9- 



4 



( 



TABLE 1.2 



COMPARISONS SUBTESTS ON THE CALIFORNIA ACHIEVEMENT TESTS (Levels 13-19) 



AND TESTS OF ACADEMIC PROGRESS (TAP) 



CALIFORNIA 



TAP 



|~ READING VOCABULARY 
L READING COMPREHENSION 



READING* 



SPILING 

LANGUAGE MECHANICS 
- LANGUAGE EXPRESSION 



i- MATHEMATICS COMPUTATIO 



MATHEMATICS CONCEPTS 
^ AND APPLICATIONS 



ENGLISH - 



MATHEMATICS 



REFERENCE SKILLS 



SOCIAL STUDIES 
SCIENCE , 



LITERATURE ^ 



-io- . ii 



V • 

; • IB. SKILLS 'MEASURED 

i 

• , - 

j • , 

v. 

Table 1.4 shows the .skills measured • by -each subtest of the California 
Achievement Tests, Levels' 13 to 19. More detailed descriptions of the 
objectives oh each level can *be found in the Class Management Guide for 
the California published by CTB /McGraw-Hill ".' A 'copy will, be available in 
• 'each school. • * ' * , ' * ' 



The test levels are recommended for administration by the publisher at 

gr*fde ranges that overlap. The levels and recommended grade ranges are 

* 

shown* in Table ' L 3. ■ x s - ■ _ , s jk 



i TABLE 


1.3 , - 






TEST LEVELS AND 


GRADE RANGES 






» s 




» 




Level 


Range 




■ 9 



13 
14 
15 
16 
17 
18 
19 



2.6 - 3.9 

3.6.- 4.9 

4.6^ - 5.9 

5.6"- 6.9 

6.6 - 7.9 

7.6 - 9.9 

9.6 - 12.9 



t 



The Montgomery County Public Schools will give Level 13 in Gr^de 3, Level 
15 in Grade. 5, v Level 18. ; in Grade 8 and Level 19 in Grade 11. 



4 ^ fc. 



Table 1.4 ■ m 

. - CATEGORY OBJECTIVES t>\ LEVEL F OR T * 

* THE CALIFORNIA* ACHIEVEMENT TESTS a 



Test/Category Objectives j y Level * 

' «^ j ^ p ri4 i? i6 it is i9 



Phonics Analysis' 

m * 



7B~ 



\ 



.Consonant Clusters/Digraphs x 
Shorty Long Vowels/ 

Vowel Combinations * x 

Diphthongs • * x 

Variant Vowels/Vowel Combinations x 



/ 



Structural Analy^ gg ^ 



Compound Words/Syllables/ 



\ 



Gtfntratffions / * $ ^ 

Base Words/Affixes " x 



Reading Vocabulary r ' * 

Same Meaning , - x'^ x x x x x 

" Opposite Meaning x x * x x x. x x 

Jfuf timeaning x x /x x x x x 

/ 

w # / * ^ . * 

Reading Comprehension . , % 

Recall of. Fact? 
Inferred Meaning • 
Character Analysis 
Figurative Language 
w Author Attitude/Position 
Techniques of Persuasion* 
% Real/Unreal Elements , ' x* 



Spelling 1 ^ v ■ # - 

# . Consonant Phonemes /Graphemes x « k , x x , x 

• Vowel Phonenmes/Graphemes. * 4 * - x . x x * ^ -x * x* x 

Morphemic Units . • ' x x x ' x x & : * \ x 



X 


X 
X 


X 


X 
X 


x J 

X 


X 
< X 


X 




X 


X » 


X 


X 




X 


X 


X 


X * 


» X - 


X 




X 




X . 




X 


X 


X 


X 






% 


X 


X 


X 


X 



Table 1.4 (Continued)* 



Test/Category Objectives 



Level 



Language Mechanics » 

i 

Capitalization of i/Proper Nouns, 
Capitalization of .1/ 

Proper Nouns /Adjectives 
Capitalization of Beginning 

wlfds/Titles 
•Punctuation of End Mark§ 
• • ' Pi/nctuatidn of Egd Marks/Colon 

Semicolon ' . ^ 
Punctuation of Comma 
Punctuation^^ Quotation Marks 



13 14 • 15 16 17 18 " 19 



x 
x 



X 
X 



X 
X 



X X X X 

X X 



X 
X 



X 
X 



X s X 
x, • X 

X X 



X 
X 



X 
X 



Language Expression 



Pronouns 
•Verbs 

Adjectives . 
Subjects/Verbs 

Modifying Words • / 
, Modifying/Trans*. Words 

Complete/lncomplete/Run-on 

Verbosi ty/Repi ti ti on 

Misplaced Modif iers/Nonparallel 

Paragraph Sequence * 

Paragraph Sequence/Topic Sentence 
* Paragraph Sequence/Topic, 
Concluding Sentence 



X. 
X 
X 
X 



X 
X 
X 
X 

X 
X 



X 
X 
X 
X 

X 
X 



X 
X 
X 
X 

X 
X 



X 
X 



X 
X 



X 
X 



X 
X 



X 
X 



X 
X 



X 
X 



Mathematics Computation 
g ~ i * *— 

tion 
raction 
tiplication 
division 




x 

j 

X 
X 
X 



% 

X 
X 
X 
X 



X 
X 
X 



X 
X 
X 
X 



X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



-13- 



14 



X 



Table 1A (Continued) 

Test/Category Objectives 




Level 










13 


14' 


15 


16 


17 


18 


19 


flath Concepts and Application. 

* 












• 




Numeration 


X 


>( 

X 


X 


X 


• X 


X 


X 


Number Theory 




X 


X 


X 


X 




X 


Number Theory/'Sdntences » 






J 






X 




Number Sentences 




X 


X 


X 








f Number Sentences/Properties 


x 1 








X 




X 


v * Number Properties 




X4 


X 


X 








* Common Scales ^ 


X 


X 


X 










Geometry 


1 X 


X 






X 


X 


X 


Measurement * 


X 


X 






X 


X 


X 


^ Geometry /Measurement % 








X 








Functions and Graphs- 






a 




X 


X 


X 


* Gtaphs ' 


X 


X 




X 








* Geomet;ry/Mea*surements /Graphs 






X 










Story Problems m * 


X 


X 


X 


X 


X 


X 


X 


Reference Skills 

♦ 


I 








* 






Title Page/Copyri'ght Page 




* 

X 




♦ 








Table pf Contents 




X 






* 






r > Index 




X 


r 










Dictionary Page 




X 


X 


X 


X 




X 


* , Map \ 




X 


X 


X 


X 


X 


X 


* Table 






X 


X 


X 


X 




>> Library Catalog Cards 






X 


X 








Diagram 










X 






Form * 












X 


X 


Readers 1 Guide 












X 


X 




.14- 



15 



9 

.ERIC 



1C. QUESTION FORMATS y 



This section* provides examples of test questions In each subtest, 
organised by test levels. 



The question formats change among ^vmtests and- sometimes within a 
v * 

subtest .The latter 'case is especially important to note because, in 
many ^.cases, students cannot be- given new verl^il instructions when a 
format changes within a subtest. Another, factor to note is that the 

questions *for some subtests of Level 13 must v be read albud to the 

* , * * * • 

students- Test administrators must read these 



question^fcy.ery carefully 



* * . V 

and say no more than the Examiner's Manual requires. %n reading this 

chapter, one ahfciild be aware of the level to* be used with a specific 

* ♦ ♦ . 

class because of. the changes noted above. • 

* - 

The format examples presented here, generally use easy questions and do 

not 'reflect the level of the questions on the test. The correct answer 

„ > » 

f or jpach example' is indicated by an asterisk (*). 4 



-15- 



) 



, Phonic Analysis 



Level 13 



The following two formats are used for the subtest, Phonic Analysis, 



given only on this level. , 



The student, is to find the word that has the same beginning (or 
ending? as the word given by the teacher, - 



Example: The teacher says "shy.. • . shy" 



Q steeple 
(*) ship 
r Q scrap 



2. The student \shorfld read the word with the , under lined part and 
then choose the word with the same vowel souiid. 

• • • - r 



"Example: 



tree Q *rip (*) deep Q tray Q die - 



Structural Analysis 



Level 13 , V 

The following Ngix formats are used on the subtest, Structural Anal/sis, 
gi^en only on this level. ) „ 

1. I The student is given a word and asked to 1 find the word from the 
list that ^could bet combined with the first word to make another 
' ~ word ♦ % ' * 



Example: 

** . ' 

up - road city school ■ stairs 

o o o © 

« • - : * "i 

The student is asked to count the number* of syllables in the 
word given on the left, 

Ex'ample: 



doctor- « , 5 . »' s 1 , 2 3 4 



'• O ©- 0 0 *.' » 



3. The student is given .411 underlined word and asked to choose the 
%m .'/ r yord pair that meSti& the same thing* 



Example: we 1 re * * ^ 
we were , 
we ,d$e 



X 

\ 



4. The student* is asked to choose the word whose underlined part is 
the base, word. « 



Example: 



(*) dark er 
(^) small est 

un ^Qne 
^) call ing 



5. The student is asked to choose the wojd that has** the prefix 
underlined. 



|^ Example: 



o 



continue 



pffice 
0 redo 
apart 



I 

6. The student is aslced to select the wor£ that* has the suffix 

> 

underlined. 



Example; 



£^ car eful 
readable 

(*) friendly 

1 

Q brig hten 



f- 



ERLC 



•18- 



19 



Reading Vocabulary 



Lere3^ 13-19 , 



The ^f^llowing three formats are iiicl # uded on these .levels* 

/ • • r 

/ 1. The student is to choose the word with, the same meaning as the 
. .underlined word in the phrase- ^ 



Example: choose the answer 



a. x fake 
*b. select 



c. read ^ 

d. write 

4 



* 



^e student must choose the word with the opposite meaning of 
tfie underlined wo'rd in the phrase. * 



Example: '.an interesting book £ 

m I ' 

a. long * \ * 

\ 

*b. boring 

c. good 1 

d. green „ \ 

A 

: . • ' > * 

3. The student is asked to choose one of three sentences which uses 
the underlined word the same way as in the given definition. 



N 



Examples suddenly, or briskly 

* ♦ 
a* She closed the snap on her jacket. 

b. He began to snap at his friend. 

k 

*c. The cord broke with a snap . 

-19- « ' *i » ' 

. • " ' 20- ~ > S 



Level 13 



Reading Comprehension 



The following formats are used on this level, ^ 



p. ■ 



1„. The student is asked- to complete the septenoe. 




le: 



I 



An orange is something to 

, O see 

(*)• ta^ste 
O f eel 



)• 



2. The student is given three sentences*. , She must decide which 
sentence tells about something that could happen. 



i 



Example: 



The cat" walked across the street* 
f) The b$y spread his wings* 
The bird sang in the tr.ee* 



.J 



ERIC 



•20- 



21 



Levels 13-14 



This format, is a.lso„ included on these two levels. 



3. The student is asked to find the word that is used in a similar 
way as the underlined word(s) in the given sentence. 



V 



Example: 

\ The boy is in a fog , 

a. lost 
*b. confused 
*• c.j± wet 



i. 



cold 



\ 




Levelrs 13-19 

~T 

This, format is used on all levels. 

if 

4. The student is asked to read a passage carefully, and then 

/ . 

answer, the questions', that follow. There will be 'several 

different kinds of passages. 



Example: 



When the children woke up, they were excited bec^se .it had 
snowed during the night. They looked forward to\ making a 
-spoyraan after breakfast. . * 



9 

ERLC 



"When did it snow? 

* O f 

f . last week " 

€5 * 

g. yesterday 

*h. during the night 
j.. tomorrow * 

. 1 ' -21- ' 



/ 



■22 



Spelling 



Level 13' 



The following format is* us^d on this level only. 



1. The student must decide itf the underlined word in the sentence 

X \ 

is right or wrong. \ 



Example: 



> Th§ dog barked . Righ^f ' Wrong 



r 



© ■ o 



\ 

Levels 14-19 

This format is" used for, all 1 ofcfeer levels. 



v 



2. The student is given\a complete sentence with two or three of 
l the words underlined. \he student must decide if one of the^e 
. < underlined words is misspelled. If not, he must choose* the word 



"none." 



Example: The banker, washed his durty <3 ^ r ' None 
la ■ *t> . c 



-22- 



23 



Language Mechanics 



Levels 13^19 

The following two formats are used on all levels. 



♦ 



1. The student must indicate the section that contains a word to be 



9 

ERIC. 



capitalized or choos^^he word "none. 11 
Exampl'e: - * 

* ' \ 

Billy told/^his friend / to meet *him / at gino's. None 
a ^ b c *d e 



2. The student must de'cide which punctuation iiark, if any, i? 
missing. % 

Example: s 



What time is it-> 



t 



None 



-V 



*c 



23- 



i 



t 

Language Expression 



Levels 13-19- ) 

The following two formats are used, on all levels, 



^The student is asked' to chopse a yord or words that best 

V ■ 



^complete a sentence. 



Example: Give the test to • children. 

'. ' i 

a. them ~ 
' *b. those 



The student is asked to choose a part of a sentence that is the 
subject, and the part of the sentence that is the verb. 

Example: subject 

She saw the stars last night. * * * 

*a % b c d 

Example: verb 

She saw the stars last night . 
% *b c » d * 



\ 



25 



Tgiels 14-16 , 

This format is also included on above levels. 



* • ( * 

3. The student is asked to recognize a complete sentence, an 

incomplete sentence or a run-on sentence. . ' . . - 



Example: • . % . ,* • ^ 

If the telephone does not ^ing. 
4» s a. run-on sentence 



b. complete sentence ■ 

\ 

*c. not a complete sentence 

; • ' , ' -.. . /~\ v . 

Levels 14-19 * . C* 

This format is also included on the"above levels V 




9 

' ERIC 



4 - The student is given a^List of '^sentences, and must decide the 
best order of -the k sentences which make up a paragraj 

' ■ . i 

Example: • • < - 

, ** . 1* Then he went .to school. 

2. Tim woke up at 6- o'clock. 

3* " He got out of bed. 

4. He got dressed and ate breakfast. 

. * a. 1, 2, 3, 4- 

■ 1 v v ' 

*b.* 2, '4, lV 

v 

c. 2, 4, 3, 1 

d. 4, 3, 2, 1 • 



. -25- * ^ 




4* ' 



Levels 16-19 * 

" " * 

This format is *lso included on the above-levels* 

• * » 

5. The student is given a sentence , that is called the topic 
sentence. He/She must* choose the pair of sentences that 
develops the topic of the given sentence bes^t. * 



Example: The play was a success for the students in our school. 

*a. The students worked very hard at all rehearsals. They 
performed well during the performance. 

b. This was the second time the school Sad done "Romeo 
and Juliet. h It was a favorite. ^ t 

c. The school, sponsors many extracurricular events; The 
play is one of them. 

d. The pl£y was performed at night. It la&ted two hours. 




two fVrmats are also included on the above levels. 
* 6. The student ie supplied with three isentences^-llie^-tudent must 
choose the one that is most clearly expressed. S^**** 

• * 

Example: 

a. The teacher helped the boy to sit down with the broken 

. . • * x '. . 

*b. The teacher helped the boy with the broken arm" to sit 
'down. . .** /* 

-* . * .. 

* c. The teacher .helped to sit down the boy with the broken 
arm • ^ , - 



\ 

*26- 



C 



27 




li The* student must choose a concluding statement from 4 
alternatives, after reading the given paragraph. 

Example: * ^ 

Traditionally people bought their necessary purchases from 
merchants * in various par£s of the cities . With the development 
of the suburban shopping malls , people ^were able to find 
everything in one area. 

% 

*a. The- shopping malls are hurting businesses in the cities. ' 
• • • * 

b. Many shopping malls also have several movie theaters. 

* c. Many people live- in the cities. 

d. Shopping malls -are usually close to large housing areas. 




— - ' Mathematics Computation 

Levels '13-19 

There are four formats included on all levels. 



The student is given exercises of addition, subtraction, 



mu 



ltiplication and division. Each operation is presented in two 



ways, horizontally and vertically, and "none of the above" is always 

♦ 

an alternative. 



1. Addition 

Example: 326 + 2 



\ 



a. 324 

b. 163' H 

c. 652 
*d. 328 

e. None of the above 



Exatople: 224 



+ 4 



a. 220 

<f 

56^ 

c. 896 
*d. 228 
e. ( None of the above' 




9 

ERLC 



29 



-28- 



t 



7 



Subtraction 
Example \ 404 t-400 



Example: 320 
-> 300 



a. 100 

b. 804 



*c. 
d. 



4 

* 

8< 



e* None of the above 



a. 100 

b. . 620 
r *o. 20 

. d. 40 

e. None of the above 



Multiplication 
Example: 50 x 2 ■ 



a. 



25 



*b. 100 

c. " 48 - 

di 52 * 

e. None of the above 



Example: 16 
7 x4 



a. 4 

b. 164 

c. 12 

d. . 20 

k e . None of „ the above 



-29- 



30 



/ 

Division 
Example: 48 7 4 - 



Example: 2)10 ■ 



*a. 
b. 
c. 
d. 



*a. 
b. 
c. 
d. 



12 
24 
44 
52 



None of the above 
< 



5 w 
20 

12 

None of the above 



.30- 31 



Mathematics Concepts and Application 



Levels 13-19 . 

One format is used on the subtest, Mathematics Concepts" and Application, 
on all levels/ U - , 



The student is asked to choose the correct answer to a question 
which sometimes uses a picture, table or diagram to illustrate 
the problem. . The answers do not always require a numerical 
response. 



Example: 



* 



• . U 2 

\ 



'/ ■ /— / / / inches 

3 4 5 6 



How many inches are indicated by the' X? 

V 

£ » 

a. 6 

b. 1 . 
m *c. 3 

w * d. 4 I 
€. ' None of the above. 



•31 



32 



Reference Skills 



Level 8 14-19 



One format .xs used on the sub tes t , Reference Ski lis, on the above 
levels. Thejre is no Reference .Skills subtest on Level 13. 



1. The student is -given a sample reference material and then 'asked 

to answer questions that relate - to it. Several different 

samples are used throughout the test. The samples include maps, 
• * 

indexes, 'tables, dictionary pages, library catalog cards , 
. 

diagrams, forms, title pages and reader's guides. 



Example: 

GROCERY* ITEMS COST 
. eggs/lb. $ .87 -J 

milk/half gal. ' " — .92 

bread/loaf . 75 

cheese/lb./ * • * .99 



How much does a loaf of bread cost? 

a. $ .87 

b. .92 



*c. .75 
d. .99 



33 



-32- 



This chapter contains information about reports on test performance, that 
-are provided to schools. Additional test inf orm^i^oiu-rs" reported each 




year in the Annual Test Report. The discussion of X each report covers 
three areas — the questions that can be answered by the report, an 
exj>lanation of the data reported, and a description of how to use the 

report. Technical terms used here are explained in Chapter 3. 

♦ & 



t ' , 



0 ' 



\ 



- 35 35 



2A. SCHOOL FREQUENCY DISTRIBUTION AND MEAN SCORE REPORT s 

v . . : • , . . 

• * 

< v • 

yThese* reports can be used to answer the following questions.- 



1. What is the average or typical test score for a 'school?^ 

2. Does the school have subject areas in Which it is performing 
especially well or especially poorly? ' 

3. How much variation is there in. the test scores for a school? 



Data Reported 



The School Frequency Distributipn contains the number (frequency) of 

students attaining each raw score- on each subtest and total. Table 2.1 

is a sample report ^that illustrates the parta ot the ^repo^t^that will 

always be prpyided. Presented with each raw score is its scale score, 
• - 

national percentile rank, national stanine and normal curve equivalent, 

* 4 

MCPS .percentile ranks iand stanines may also be reported. These are 

computed each year and thus are not ciirectly comparable from year to year 

c 



0 



\ 



/ 



JS. 



36 



because they are based on different students each time. , The final column 
on the report presents the cumulative frequency, ^his is the number of 
students in t the school *^who scored 'at or below the raw score lifted in 
that row- 

y 

At the bottom of the report of- each ' subtest, the mean and' standaVd 
deviation are presented for each score for which they can be computed . 
^Hp the left of the raw score is an indication of where the median (MED) 
and first (Ql) and third (Q3) quartiles fall. 



TABLE 2.1 
SAMPLE FREQUENCY DISTRIBUTION 









National 


Normal 








Raw Scale 


National 


Rank 


Curve 




Cumulative 




Score Score 


Stanine 


Percentile ' Equivalent 


Frequency Frequency 




30 622 


9 


99 


99 


1 


12 




28 5fT~ 


8 


92 


80 


1 


11 


Q3 


25 553 


7 


86 


73 


2 


10 




24 546 


6 


76' 


65 


1 


8 


MED 


22 530 


5 


57 , 


54 


3 


7 


Ql 


20 517 


4 


35 


42 - 


2 


4 


16 492 


2 


10 


23 


2 


2 


MEAN 


22.50 539.25 




, 56.83 






SD 


4.07 35.89 




21.7141 






The 


mean scores 


and their 


national 


percentile ranks are 


listed in the 



Mean Score Report that is published as part of the Annual School Progress 

f ' 

Report and the countywide Annual Test Report. A sample of The Mean Score 
Report is shown in Table 2.2. „ 

\ 

l The mean scale score is shown in this example instead of the mean 
grade equivalent that was reported for the Iowa Tests of Bajic Skills. 
The scale score is the recommended score to use for this pu^poseixecause 
it is on an .equal-interval scale. 



ERLC 



-38- 



7 



TABLE .2.2 . 



SCHOOL MEAN SCORES 









Grade 3 




fcrade $ "> 


\ 




Scale 


. Percentile 


Scale 


Percentile 






Score 


Rank 


Score 


Rank 








/ SCO* 




(.50;* ^ 


TOTAL BATTERY 


V 


387 


68 


473 


69 


Phonics Analysis 


/ 


jyy 


OJ 




« 


Structural Analysis 






7 1 




* 


Reading Vocabulary 




*375 


, . 42 


452 


46 


Reading Comprehension 




416 


' 63 


498 


68 


TOTAL READING 




389 


1 


6.70 


Art 


Spelling 




437 


65 


546 


77 


Language Mechanics 




460 


65 


546 


-> 

77 


Language Expression 




446 


64 . 


536 


78 


TOTAL LANGUAGE 




440 


66 


536 


1 79 


Math Computation 




348 


70 


451 


65 


Math Concepts and Applications 


410 


72 


479 


72 


TOTAL MATH & 




P81 


72 


465 


' 69 


Reference Skills 






> 


523 


78 



*Mean for ^the national norm group for the Total Battery 
Usgs of These Report 



The typical score for a school can be the mean or median; generally they 
are equal or close to being equal. The typical score can be used to 
determine the strengths and weaknesses in each school 1 s program. 
Percentile pranks (PR) of the typical scores should be compared to make 
this determination. HpweVer, PRs'^an only indicate which score is 
higher. 'The size of differences between PRs should' not b6 used, If one 
wants to compare ' differences between subtests, (e.g., Is the difference 



-39- 



38 



between Math Computation an/i Reading Comprehension larger than the 

difference, between Language Mechanics and Spelling?) the mean or median 

2 

computed using the normal curve equivalent (NCE) scale should be used. 

' * • • * » 

The most meaningful indicator of score variation in a school can be 
obtained by using ithe range of scores between the first (01) and third 
(Q3) quartiles, called the quartile range. Thi$ shows where the piddle 
fifty percent of the scpres in the school were. \ This range can indicate 
if most of tht students in the school have a similar achieve men t level (a 
homogeneous school ) or if the achievement levels are spread over a wide 
range (a heterogeneous school). The difference between Ql and Q3 should 
be computed using NCE scores. 



The School Mean Score Report brings together the typical school scores on 

all subtests so that the identification of strengths and weaknesses is 

apparent. As explained previously, this identification should be done by 

comparing th$. perc^jtile ranks of each subtest. The (scale scores cannot 

.be used for this purpose because they do not indicate the same level of 

achievement across subtests. The results in Table 2.2 show that the 

school may want to take a close look at; how they teach 

* 

the percentile ranfc for that subtest is somewhat below 



vocabulary becaijse 
:he others. 



2 Percentile ranks are not on an equal interval 
10 point difference does not have the same meaning at 
scale* NCEs are on an equal interval scale. Seje 
addi tional discussion. 



scale and thus* a 
all poipts on tfie 
Chapter 3 for 



Comparison pf the mean scores across grades should hg done with caution. 



Such 



a comparison provides descriptive information only "and does* not 



1 



provide information about program effectiveness. This is because each 



grade group' is made up. of different students yith different- abili 

. y 



ty 



levels and backgrounds. The score, difference's can be caused by these 
factorKand not be related to how well the students are taught ♦ A better 
way to use test data to look at program effectiveness is explaine4 in 
.Section 2B which deals with longitudinal analysis. 



■0 




2B. REPORTING DATA FROM LONGITUDINAL ANJSHfSSS 



The longitudinal analyses of school test data can be used to help answer 

1 

the following questions: 



1. Have 8tudents<«who have been in the same school for at least two 
years been able to maintain, or improve, their standing relative 
to the national norm? 4 * 

2. How are school test scores affected by student: transfers both in 

and out of the school in the years between test ' administrations? 
* * * * 

3. Do the scores of the transferring groups indicate meaningful 
changes in the school's population? 



The answer to the first question provides the best information from 
norm-referenced test data for looking at the effectiveness of a school 
program with regard to the objectives * measured by the California 
Achievement Tests . * Data "relating to all three questions can be used to, 
determine if programmatic changes are # needed. % 

Data Reported , 

■» «- * 

The results are reported as two different types of data — longitudinal (L) 
and nonlongitudinal (NL). In the 'results for Schpol A (Table. 2.3)' the 
longitudinal data are the results from one grol^p of students who were 

tested in the same school ' botH yeaifs (i.e., for two' consecutive test 

v • 

administrations)* /The nonlongitudinal data represents results from two 



I 



4 \ 

m 

i. 41. 



groups of students who were each tested in the school only one* year. The- 
group in the lower grade (3) transferred out of the school sometime after 
the first test administration. The group tested in the higher grade (5) * 
transferred into the school sometime after the first ^t^st^/ 
administration-. Remember that the two nonlongitudinal groups are 
composed of completely different students. — 



TABLE 2.3 

r * SCHOOL A 

LONGITUDINAL ANALYSIS: GRADE 3 and Sr^C^IFORNIA ACHIEVEMENT TEST 



Students Tested in This 
School Both Years ' 



Students Tested in This 







Grade 


Year 


Number 
Taking 
Test 


NCE 
Mean 


Percentile 
Rank of 
Mean 


Number 
Taking 
Test 


NCE 
Mean 


Percentile ' 
Rank of 
Mean 


TOTAL 


READING 


3 


1978 


47 


, 75 


88 


33 


47 


44 






5 


1980 


47 


•74 


87 


37 


+69 


82 


TOTAL 


LANGUAGE 


3 


1978 


46 


72 


85 


j 

33 


49 


49 " , 




y 


5 


1980 


46 


+83 


94 


35' 


+65 


76 s 


TOTAL 


MATH 


3 


1978 


* 47 


76' 


89 


35 


53 


55 






5 


1980 


47 


78 . 


91 


38 


+76 


89 






















TOTAL 


BATTERY 


3 


197,8 


• 44 


75 


88" 


32 


•50 


50 






5 


1980 


44 


79 


92 


34 


+70 


83 « 




















/ >> 



■44- 



42 



Mean scores are presented for both the longitudinal and norilongitudinal 
groups on the California.* Total Reading, Total Language, Total 

Mathematics, and Total Battery, The means are computed and reported 

* 3 " 
using -the Normal Curve Equivalent (NCE) scale* Percentile ranks for 

L - - 

the mean - scores are also reported because they provide an easy t 
understand frame of- reference. Also reported are the number of students 
*Tn each group. The data for any group with fewer thkn 35 students should 
be viewed with caution. Data are not reported for groups of less than 10 
because such^esults would be very unstable. 

The. rows in the tables separate grades; the columns separate the L group 
from the NL groups. 

Use of Longitudinal Data 

s 

Analysis of longitudinal data can provide an indication of ^ the 
effectiveness of a school's instructional program.^lfScore trends within a 
school provide <the best information when using thrSC dat;a^ To determine 
the score trend for the Ltgroups, NCE means should be ^bttpared *f or k tlie, 



^The NCE scale is used y because it is more appropriate for ^^oking 
at score differences than are grade equivalent scores or De£centile 
ranks. This is because on the NCE scale there is an equ^d interval 
between all values* This is not true of grade' equivalent jrfc percentile 
ranks* Chapter 3 has an extensive discussion of these tenps. 



> 43 

A . 



« 

two grades tested. The expected trend is that students will maintain the 

sam€ NCE score or percentile rank, within error limits, from one grade to 

* 

the other. Substantial deviations from 'this expected pattern should 

be considered indications of possible strengths or weaknesses in the 

* * x 
school program* Substantial is defined here as greater than 7 NCE 

points , 



j 5 



The data shown' in Table 2.3 for School A demonstrate how to interpret 



tlU> 



repor£v "The increase in Total Language is the only substantial 
change/tor the school. (This is indicated by a "+".) The ,mean score 
in<5reased 11 NCE points, somewhat above the 7 point standard. This could 
indicate that the school has an especially strong program for teaching* 
the punctuation, capitalization, and usage skills measured by the^ 
California. A^ a 8 rou P> these students appear to be making satisfactory 
progress^ in thd* other areas of the California. 

Any felines of more than 7 points should' be considered indications of 

• ■* • 1 

areas yhere the school may*, have to put special emphasis.* These are 
iWicated by 9 The results for School B (Table 2.4) show such a 

( ' . '* 

defcKne ^3^Tota3r-R^ading . 

0 / 



^Tljia^expected trend is sometimes adjusted for reasons -explained 
plater in th^Ls sectiQn. ^ *» 

5seven NCE points- is one-third o£ a standard deviation. This 
standard is often used as an indication of educationally meaningful, 
change for group data. % _ 



-46- 

• 44 



TABLE 2.4 * 



SCHOOL B 



LONGITUDINAL ANALYSIS: GRADE 7 and 9: CALIFORNIA ACHIEVEMENT- TEST^ 



Students Tested in. This 
S fchool Both Years 
f Numbjer 1 
/ Takijng 



Percentile 
NCE Rank of 
Mean Mean 



Students Tested in This 

School Only One Year 
Number Percentile 
Taking NCE* Rank' of 



TOTAL READING 


7 


197^^53 


62 


71 


28 


63 


73 




9 


1980 253 


-48 


- 47 


31* 


-32 


20. 


TOTAL LANGUAGE 


7 


. 1978 251 


59 


67 


27 


66 


78 




9 


1980 251 


54 


58 


31 


-34 


23 


« 

TOTAL MATH 


7 • 


1978 254 


56 


•62 . 


28 


67 


79 ^ 




9 


1980 .*254 


53. 


56 


32 


-29 

w 


16 


TOTAL BATTERY 


7 


1978 248 


57 


* 65 


28 


■ 64 


75 




5 . 


1980 248 

4 


51 


52 


31 


-30 


17 



Additipnal insight into K the meaning of longitudinal results can be 
obtained "by comparing the trends for'a school with those of schools that 
had similar starting points .(e.g., Grade 7 in the .School jB example). 
"Similar starting points could be defined as any scores within 7 NCE 
points of the score for the school being studied. This comparison can/Be^ 
helpful because % the relationship between level of achievement and 

NT 

performance on a standardized test is not always the same at "all levels" 
of achievement. 



-47- # - ' 

. 45- 



For example, it ^^likely that groups with very high scores in Grade 7 

may tend tor=snow a decline over two or three years. Therefore, if a high 

achieving school : shows a substantial decline it can be useful (probably 

comforting) to know if other high achieving schools show the same .trend. 

At t;he same time, if tme other school^ do not show this trendy the need 

* % • 

tot improvement in the school with declining scores is emphasized. 
Use of Nonlongi t^udinal Data 

s 

Nonlongitudmal data can be used to assess the effects of S'tudent 
transfers on school test scores. As with the * longitudinal data, the 
trend of score's , not the absolute values , is the most useful information 
and a change of more, than 7 NCE points should be considered a substantial 

n. 

change* Hete, however, substantial changes most likely reflect 
population shifts, not the strength or weakness of an instructional 
program. Obviously, significant population shifts indicate that programs 
might have to be modified to meet the needs of the incoming students. 

- - \ 

m * 

! 

The results ifor studerlts tested only one year in School A (Table - 2. 3) 
"Ijhow a -meaningful change in the school population. Th& score increase 
from third to fifth grade was considerably larger than 7 points in all 
cases ♦ Addi tionally the group of students transferring into the school 
(i.e., those in Grade 5) . represents a large, portion ♦(greater^Sian 40 
percent) of the fifth grade in that school. This means that the score 
trend will have an effect on the overall school averages . Whert the NL 



•48- 



and L results are compared, it can be seen that the change in population 

i,s toward mdre homogeneity in the class tested in Grade 5. The students 

who left the school sometime after third grade testing were scoring well 

^below the stable group who remained in the school for both test 

administrations. However, those who transferred in after third- grade 

testing scored almost as high as the stable group. Thus, the overall 

* 

schpol average will increase and the achievement levels of tne students 
in Grade 5 are more similar within that group than they were in Grade 3* 

The nonlongitudinal results for School B also 4 show substantial 
performance differences between students transferring out (Grade 7) and 
thos^ transferring in .(Grade 9), In this case the trend is a decline. 
The overall school average will be affected only a small amount by this 
difference because the transfers make up ortly abo u t 1 1 pe r ce nt of the 
students tested in each grade. However, this does not mean that thesei 



results should be ignored. The transfer students tested in Grade 9, have 
achievement levels somewhat below the other students in /the school and 
this difference has to be considered when planning instruction. These 
students may very well need an instructional program quite different from 
that which is appropriate for most of the students in the stable group. 



47 



When reviewing longitudinal and nonlongi tudinal data one additional 
factor should be considered — the overall county trends. The county 



longitudinal trend is the difference of scores between two test 
t 

administrations for all students in the same school both times . The 
coupty nonlongi tudinal trend is the difference of scores between two test 
administrations for all students tested only once in a school • This 
trend should be used to adjust the expected trend of £qual NCE scores^ 
that was previously discussed. This is because that expectation £f equal 
percentile ran^ across * grades could be affected by factors- such as 
sampling error in the te&t norms and varying degrees of match*between the 
test and curriculum at different grade levels. It is possible that 
these factors actually make the fifth grade test a little more difficult 
for MCPS students than the third grade test. If such factors are 
operating on MCPS test scores it would be unfair to hold to the equal NCE 
expectation. Therefore, the 7 point standard should ^Se basecl on the 
difference from the county trend. * For example, if , the county trend is a 
3 point decline, thea a school score decline would not be considered 
substantial until it was greater than 10 * points* On that same test a 
substantial s increase would be anything greater than 4 points (i.e., more 
than 7 points above a 3 point decline). 



•50- 

48 



2C. • PERCENT- CORRECT BY OBJECTIVE - 



t The % Percent Correct By Objectives Report provides information that can be 

\ 



used to answer the following. question: 



Within each subtest, are there any skill areas Collectives) in 
which the school is performing especially well or especially 
poorly? 



A 



Data Reported 
1 



Six data elements are reported for each objective on each subtest, 
r These are defined below. 

Number of Items — the number of questions measuring , that 
objective according to' the publishers classification. 



f School Percent Correct "— thife average percent of correct" answers 
for the questions measuring ttfi&t objective, for the school. 



Area -Percent Correct — the average percent o^correct answers 
$ for the questions measuring that ' Objective,* for the 
administrative area. 



0 -51- 



ERJCV * 



49 



County Percent Correct — the average percent of correct answers on 
the questions measuring that objective, for the county. 



Norm ,ggrcent Correct — the^ average 'percent of correct answers on the 
questions measuring that objectiye, for the national norming sample 
when the test was standardized. (This information is not available 
for some standardised tests.) 



County/School Differences — the result of subtracting the county 
percent correct £rom the school percent correct. If the . school 
percent is higher- the result will be positive, if it is lower the 
result will be' negative. 

\ • 

Use of this report , t 

V 

r 

' The information needed to evaluate a school's performance on objectives 
is found in the l^st colufnn of the report, County/School Differences 

i shown in Table 2.5. It is* necessary to use these particular data 
because, on norm-ref erenced tepts, objectives "are measured by questions 
of varying degrees of difficulty. One objective may have five easy 
questions; another objective may have five very difficult questions. 
This, means that a school may have 30 percent mote correct answers for one 
objective than another simply because it is measured by easier questions 
and not because that objective is being taught more effectively. In that 
type of situation the use of School Percent Correct to determine 
strengths and weaknesses will provide misleading information* , To 
overcome 'this problem an estimate of the difficulty ot the questions 



-52- ' 

\ 50 



measuring each objective must be determined. The County Percent Correct 

is used as this estimate . Thus , if the county average low, the , 

» 

school' 8 -average can be expected to be low. 

It is assumed that when the values in the County/School Difference column 

are approximately the same for all objectives, the school' is teaching all 

of' the objectives about equally well . If there is a substantial 

difference between the values in this column, then it is possible that 

areas of strength or weakness have been identified. Substantial 

difference is defined here as 20 percent. 6 If the difference S^tween. 

the highest and lowest value in this columu^is more than 20 percent, ,the 

test results may have identified an objective that is taught especially 

• «* 

well or ..especially ggpri-y; , *** 

Table 2.5 shows a report for the Language Mechanics section of the 
California Achievement Test. The school shown had a higher^ percent 
correct than the coun£y 6n all five objectives, however, this does not 
necessarily mea9^that this school is performing well on all objectives. 
Actually, the performance on the first objective is considerably lower, 
when compared to the couilffy performance, tban on the other objectives . 
The schoo 1 may want to look into how i t teaches the ski lis measured by 
that objective. 



^This represents a statistically significant difference if there 
are at least 50 students in the grade tested. If the grotip is smaller, 
one\ should allow a slightly larger difference. 



-53- 



51 



Table 2.5 
LANGUAGE MECHANICS 



OBJECTIVE 


NUMBER 

OF 
ITEMS 


SCHOOL 
% 

CORRECT 


AREA X 
% 

CORRECT 


COUNTY 
CORRECT 


^ NORM 
% 

CORRECT 


COUNTY/ 
SCHOOL 
DIFFERENCES 


I, Proper Nouns, 
Adjectives 






79 


■* 

71 \ - 


75 

\ 

51 


+ 2 


Beginning Words, 
Titles 


■;i 


68 


s 

. 65 


45 


+23 


End Marks 




i 80 


60 


56 


69 


+24 


Comma 


6 


' 77 


50 


47 


53 


+30 


Quotation Marks 


3> 


92 


65 


68 - 


56 


+24 ' • 



Table 2.6 presents the results for a .school in which performance was below 
the county for all objectives. This does not necessarily mean the 
school's instructional program is poor* Students in this school could be 
scoring below the county average for various reasons that have little to 
do with the quality of the instructional program for the year. The 
knowledge and skills students briifg to school are examples of factors' over 
which the ^phool has little control. The results for tfcis school indicate 
that the comma objective is learned better than the other objectives* C^n 
thiV subtest. The County/School Difference for v that objective is 28 
points higher than the low ^ifference of -30 for the last 
o.bject^iVe — Quotation Marks. * 



-54- 



52. 



A, 




Table 2?6 . 
LANGUAGE MECHANICS 



NUMBER SCHOOL AREA COUNTY I NORM 

OF X % % % 

ITEMS' CORRECT CORRECT CORRECT / CORRECT 



OBJECTIVE 



I, Proper Noujis, 
Adjectives 


6 


52 


79 


77 / 


75 


Beginning Words, 
Titles . ' 


6 


22 


65 


45 / . 


51 


End Marks 


7 


30 


/ " 60 


56/ 


69 


Comma 


6 


45 


50 ' 


47 


f 

53 


Quotation Marks 


3 


* 38 


65 


68 


56 



COUNTY/ 
3HOOL 



CES 



-25 

-23 
-26 . 
- 2 
-3l 




Additional Considerations 



\ 

The number of questions measuring an objective should be considered when 
using this report. The guideline often used is that, if an objective is 
measured by fewer than 5 items, the coverage of the objective is 
questionable. Results for any such objectives should be used with caution. 



•55- 



53 



■ax 



2D. INDIVIDUAL TEST REPORT 



i 



The Individual Test Report presents information th^t can be used to 

/ 

answer the following question. 



Does an individual student have any subjejct areas in whifch • 
he/she shows particular strength or weakness?! 



Data Reported 



The student's test performance on the' subtests 4nd combinations of 
subtests is presented in three ways. These are briefly . explained below 

m 

and in greater d^ail in Chapter 3. 



Stanineff — The stanine range -is divided into 9 units. Each staiiine 
includes several possible t^st -sc/ores. This sckle is often used to 
report results for individa*JL s^ulents because it generally is not 
affected by spall score variation caused by test /error.? 



See Standard Error of Measurement in Chapter 



• -57- 




National Percentile Rank — The percentile rank range is divided 'into 
99 "units, 6 Percentile ranks indicate the -percentage of &tudeats in^a 
\ group^who scored the same 6r* less tha^the student whose scores are 
feeing reported. In ..thei ca§e of the *Calif otnia Achievement Tests, the 
• reference group? is the sfimple of students on whom the^test was 
* developed in 19F7. A score of 65 indicates that the student did as 
well as* or better 'than 65 percent of the students in the reference 
group. 

Score Band r— Score bands represent a 7 range of test scores around 
student's score. They indicate the mount that a -student's score may 
reasonably be expected td vary due/to test error. If a student was 
feeling poorly the day of the test, his score may haVe fallen to the 
'lower end of the band. If a student made'a couple of lucky guesses, 
her score may have been at the upper end of the ban^i.^ _ 

f. v - . ' 

■ .* t 9 + 1 ( 

Use of the Report . $ \ 

The. question regarding an individual's strengths or weaknesses can be 

answered ii^ twp ways. The first is a comparison of the S^jre bands for 

each subtest ' wi'£h the score band for the Total B a titer v. This provides 

inf onnat^n^about the ^subject areas in which the 3tu 4j« * s strongest and* 

weakest. The second way of answering this question As the- stanines (or 

percentile r^nks) ^nd provides an indication of strengths and weaknesses, 

compared to, the national norm grfcup. * « 

< 

* . * 9 

The rationale for comparing the subtest bands to the Total Battery band 

ip based the assumption *that the Total Battery score represents the 

4 ... . 

student's overall level of achievement and that any marked <4pviation from 
that overail level is noteworthy. The results of this .comparison can be 



-58- 



9 

ERIC 



4 

• 1. A subte&t score band which: is completely* below the band for the 
1 Total Battery indicates that the studentwbppears to be doing 
poorer in learning t^e skills in that subtest than in learning 
other skills measured by the California Achievement Tests, 



2. A subtest score band which is completely above the Total Battery 
score band 'indicates that the 1 student appears, to be doing better 
in l*earni&g^the skills measured by- that subtest than in- learning 
the other slcills measured by the California Achievement Tests, 

3. * A subtest score/band that overlaps the Total Battery score band 

r 

is an indicatiofi that the student's achievement in the skills 
measured by that subtest is about average^fojy that student. 



^*]L<^ig 



figure 2.1 , Reading Vocabulary is an area in which the student 
probably needs help, while Mathematics Concepts and Applications is An 
especially strong area. ' i 

High and low stanines provide indications of strengths and weaknesses 
, regardless of the location of the. score bands. Stanines of *3 or lower 
should be viewed as an indication that the student may be having trouble 
learning what is measured by the subtest. Stanines of 7 or higher! mean 



the stude^^is doing very well £n learning the skills 'measured by thaT 



•59- 

56 



FIGURE 2.1 
INDIVIDUAL TEST REPORT 



TEST NAME 



MONTGOMERY COUNTY PUBLIC SCHOOLS - DEPARTMENT OF EDUCATIONAL ACCOUNTABILITY 



N C 

A ft 

T C 

» C 

on 

NT 
A 1 

U U 

c 



NATIONAL PERCENTILE RANK 
5 • 10 20 . 30 40 50 60 70 80 



90 



95 



98 991 



CALIF, TOTAL 



59 



XXX <x 



CALIR SUBTESTS 

HEADING VOC. 
READING COMP. 
TOTAL READING 
SPELLING 
LANG. MECHANICS 
LANG. EXPRESSION 
TOTAL LANGUAGE 
MATH COMPUT. 
MATH CONC.&APP. 
TOTAL MATH 

REF. SKILLS 

— 



STANINE BANDS 



► 



PERCENTILE RANGES 
FOB EACH STANINE 



46 
40 

25 
56 
62 
59 
81 
8* 
83 
73 



► 



IB 



■■■■■I 




57 




m 



subtest . This is true even if the band for the subtest is completely ' 
below the Total Battery band. Such a pattern (i*e. , subtest with a high 

stanine and score band below, the Total Battery . score band) can be 

* . t ~" 

interpreted as indicating two things — (1) the student has highL overall I 

■ . «* * 

achievement and (2) his 0 performance on this subtest is weaker than on 

the other subtests. x ' t 

1 1 

Use of Report With Parents 

.i 

A copy of the "Individual Test Report" is sent to the parent or guardian 
for each student. Below are some possible questions parents may have 
upon review c of this report. 



There are- at least two technical ques-tions that parents are likely to ask 
£_ — with ^regard to the reporting^>?^band scores. First, they may want to 
know why the bands^ for some subtests are larger than for others. This 
difference in width occurs because some subtests have larger components 
of measurement error. ' That is, they may contain more difficult questions 
that will cause ^students to guess more often. Guessing means students 
may get credit for knowing something they do not. know. The length of the 
band can be shortened if the score is at the top ^or bottom ot the 
percentile scale. Thijs is because the student scored so high (or low) 

* that even when the error factor is subtracted (or added) the band still_ 

*> * 

does not extend beyond the 99th (or 1st) percentile. 



-61- 



ERIC 



58 



Another question could come from parents who notice that for some 
subtests the student's actual score is notT~exactly in the middle of the 
bdnd. This variation' in score position occurs because percentile ranks 
are not^^e^ual distance* apart. A more thorough explanation relating to 
this situation can/ be found in Chapter 3 in the discussion of percentile 
ranks . 




( 




5:> 



-62- 



9 

ERLC 



i 2E. PROMINENT GUIDELINES FOR INTERPRETING TEST DATA 

When reviewing test data, it may' be helpful to^ employ the following 
guidelines* These may be: particularly '.useful when interpreting 'test 
results to parents. 

o Individual test scores are only estimates of student 

performance; the scores are subject to substantial measurement 

a- 

errors. This is why score bands are much more accurate ways of 

i» 

presenting test results -thpn are numeric values, 

^ © — > 

- * 
o Norm-referenced- test scores indicate a child's relative 

- * ■ * achievement * or-* performance sfcatus^ compared <to studenfcs^of 

similar age and grade level in the nation . 

•- ^ 

o Percentile ranlcs dre derived by comparing a child's scores with 
those of students in the nation selected to establish the norms 
at some time in the J>ast. The child's scores are not being 
compared to (hose of students currently in his or her grade* 

o Norm-referenced tests provide an- estimate of which students know 
th§ most about the content included in the test. TlfSse tests do 
not define in a very specific way what students know. 
Criterion-referenced tests" are needed to serve the latter 
purpose. * * 



•63- 

l>0 



Standardized tests measure only *some of the basic content skills 
cotnmon to curricula throughout the country . They do not measure 
the full curriculum presented in a particular class, school, or 
district. 

Percentile wnks do not ref er ' to the percent of questions 
answered correctly, but to the percent \ of students whose 
performance an individual student has equaled or surpassed,* 

\ 

Tests are not perfect, A percentile rank of 95 does not 
r^>resent performance that is always superior to that 
represented by a percentile rank of 94 or 93, 

Test scores are not sufficiently precise to permit the ranking 
of students, ^except for very large differences (e.g., the fourth 
stanine vs. the seventh stanine). 

I 

A stanine score of 7 or highej generally reflects strength in 
the area tested, while a stanine score of 3 or lower may 
indicate a potential problem. A 

No one is expected to know everything on a norm-referenced 
test. Some item*, are purposely designed to be difficult. 

Avoid comparing a child' £ scores with those of his or. her 
friends because of error in the scores and the confidentiality 
of the test scores. 



This* chapter can serve -as a reference for the technical testing terms used 
throughout^J&is^andbook and in other materials .dealing with, testing. The 
terms are defined; their uses are stated; and precautions about their 
interpretation are provided. The terms are listed in alphabetical ord^r. • 



-CRITERION-REFERENCED TEST (CRT) 

Definition * ^ 

A test based on specific learning objectives (or teaching objectives), 
usually within a narrow range o^ subject matter or skills. The tests 
are designed to measure the knowledge or skills the student has 
"attained. The Maryland Functional Reading Test (MFRT) is an examjie of 
a CRT. 

Use * ~ 

*. CRTs provide information about the extent, t-o which the student has 
attained the learning objective(s) . 

Precautions ) 

1. % CRTs are often designed so a student can answer alitor almost all 

of the questions correctly or incorrectly depending on the extent 
to which the student has attained the skills being measured. They 
are not designed to yield " information about different levels of 
achievement and, therefore, cannot usually be used to rank* 
students on specific skills. 

2. To \be useful measures of specific skills, CRTs must have a 
sufficient number of questions measuring each particular skill 

included on the test. Although what is "sufficient" is not a 
fixed number, there should, in most cases, be at Jeast five 
/ questions which measure a skill. A test purporting to be a CRT 

Which has fewer than five questions per slcill should be viewed 
with skepticism. 

GRADE EQUIVALENT SCORES (GE) 
Definition 

\ > 

The grade equivalent of a given raw sBore on any test estimates the 
grade level at which the typical pupil achieves this raw score. The 
digit(s) to the left of the decimal point represent the grade; the 



-67- 



-63 



digit '-to the, right of the decimal point represents the month within the 
grade according to the following table: 



r 



Number 


Month 


0 


< September 


1 . 


x October 


- 2 . 


November 


3 


December 


^ « 4' 


January 


5 


February 


6 * 


- March * 


7 '• 


April 


, .3 


May 


9 


June-Augus t 



An example of how a test publisher might derive grade equivalents can 
be ijs_eful in understanding GE. The example presented below represents 
the best methbtiology currently in use.** Many tests , are nojnaed with 
fewer samples,. 

If the publisher is norming a fourth grade test, he will test a 
representative sample in Grades 3, 4, and 5. In each grade, ttu 
sr&nple, or two comparable samples, will be tested in the fall 
(November) and the spring (April). Thus, the grade levels being tested 
as 3.2, 3.7, ^.2, 4.7, 5.2, and 5.7. (Often publishers test only once 
a year.) ' 

( 

The(average raw test. score for the students in each group is computed 
and plotted on a graph similar to the one below. The mean scores are 
indicated by "." on the graph. All other grade-and-month values are 
estimated by interpolation between the means and extrapolation beyond 
the means*. The GEs beyond the grade range o^ students in the norming 
sample should be regarded as no better than rough estimates. 



35-- 



30-- 



25"- 



c 

O20-- 

R 

E 



15-- 



10-- 



5- - 



Figure 3.1 



GRADE ftQRM LINE 



A Possible growth pattern 



I 1 1. 1 I I L \\ I 1 I 1 1 1 

1.7 22 2.7 22 3.7 42 4.7 52 5.7 52 6.7 12 7.7 82 



9 

ERIC 



GRADE EQUIVALENT 



•68- 



64 



(T? 



Use 



GEs provide a ^familiar referent for test scores. 
Prec autions 

1. The grade equivalent score does not indicate the grade l$vel of 
work that a student can perform. It simply, estimates B t^/gfa'd6 
level of the typical student in the forming sample achieving a 
given raw score. For example, suppose a fourth grade student has 
a score wi th a grade equivalent of 5.4 on a fourth grade , tes t . 
This does 'not mean that a fourth grade student can do work' which 
is done in January in' the fifth grade. It simply estimates that 
this student did as well on a fourth grade test as the typical 
student in January of the fifth grade. However," remember that if 
the norming sample for the fourth grade test did not include any 
fifth grade students,' this estimate is very tentative. 

2. Grade equivalent scores should not be added and subtracted because 
they are not an equal distance apart at all points. They are 
developed under an assumption that learning occurs equally during 
the school year. 'In fact, students tend to learn more at 
different times in the *year. , From a strict statistical point of 
view, this lack of equal score intervals means that- mean GE scores 
should not be computed However, if the GE scores are converted 
to Normal Curve Equivalent scores which do have this equal 
interval quality,- the mean score computed from the converted 

""scores is generally very close to that computed from- the GEs, 
especially if the grade equivalents represent a wide range of 
possibliT^scores. 7\ * ' 

3. The 'attempt to build a scale based on the assumption of equal 
learning cited in Number 2 above results in differential GE gains 
for raw score changes. What occurs is that a one raw score point 
change , may c*uSe a one-month change in /GE at one place in the norm 
table and a five-month gain elsewhere. The largest changes in GE 
generally happen in the extremes of , score distribution. 

An example of the unequal GE differences between raw scores is 
shown below. These sqores are taken 'from the ITBS seventh grade 
spelling test. 



Grade Test 


Raw Score 


Grade Equivalent 


Difference in Grade Equiv. 


7 Spelling 


7 


' 3.5' \ 






8 


4.0 


.5 


7 Spilling 


9 • 


4.4 * 


.4 




25' 


8.4 






26 


8.5 


.1 




*27 


8.7 


.2 



■69- 



65 



4. Grade equivalents generally have a yider range at higher grade 
levels • This leads to the situation that a student who has ' the 
same PR in Grades 3 and 5 will probably be fuTther above (or 
below) the median in GE terms in Grade 5. This means that if 
he/she has a high PR in both grades the gain ii\ GE terms will be 
more than two years. If he/she ha^Na low PR, the gain wil f l be 
less than two GEs. Therefore, if a vCon^tant expected GE gain were 
established for all students it woul^l be :£oo high for some and 'too 
low for others. The example belot£ fronf.ITBS norms demonstrates 

• • • -thi-s* prpblem. '< ,\ 




■GraxleffiquivaienE Change 



2.4 
2.0 
1.5 



* */ 

Because a grade equivalent score represents the » performance , of a 
typical student at a given grade leyij^ approximately half of the 
students in a nationwide sample woul<f be expect gokrjjo score below 
grade level. 

» •* 
Grade equivalents should not be compared across subject areas as 
they have different meanings. For example, mathematics is more 
grade related than reading; and, therefore, the GEs are generally 
less spread out for ma'th than reading. 1 

Grade equivalents should inot be compared across different tests 
because they may have different meanings due to different norming 
samples. \2> ¥ 



INTERQUARTILE > RANGE 
Definition 



Quartiles are scores (points in a distribution) that divide a score 
distribution into quarters. Twenty-five percent of the scores are at 
or below the first quartile (Ql), 50 percent are at or below the second 
quartile (Q2, which is also the median), and 75 percent are at or below 
the third quartile (Q3). The interquartile range includes the band of 
scores that lies between Ql and Q3, or the middle 50 percent of the 
scores. 



Use 



By eliminating the effect of the lowest and highest quarters of the 
distribution, tl^e interquartile range provides a* measure of how the 
typical students* \n a group performed. 



Precaution (s) 



Eliminating the extreme scores may be removing important information 
such as the location of pockets of students needing compen^atory^r 

the median is close to either quartile, ifcould 
indicate, a large number of students at that end of the distribution who 
might require such sei~vices. 



MEAN 



Definition 

The sum of the scores 
Use v - 

The mean is used as 
student in a group. 

Precaution 



divided by the number of scores. 



a measure of the performance of the "typical" 



i ■ L 

1. In a small groi p*, the mean can be ONferty influenced ~by a few 
extreme scores « 
low but most are 

scores more, than fche median In groups whe're there are a few 
extremely low scDiyes, - the mean will, therefore, be lower than the 
median. Therefore, itf is often useful to compare the mean with 
the median. 



rhus , if a few scores in a distribution are very 
qjiite h|gh, the mean will be depressed by the low 



2. Use of the mean provide* no information about the spread of scores. 
-MEDIAN . % 

* Definition r 

The score that* divides *a test score distribution in half. Half of the 
scores are above the median, ^jalf are below. It is the score that has 
a percentile rank of 50. 



Use 



\ 



The median is used as a measure of the performance of, the "typical 11 
**. student in a group. * 

Precautio n^) , 

1A See Precaution 1 for mean. \ 

« \ 

2 f Use of, the median provides no information aboiit the spread of 
scores. 



-71- 



67 



NORMAL CURVE' 



Definitiqn /, 

A normal curve is ^distribution of scores or values which, in graphic 
form, is bell-shaped as shown in Figure 3,2. In a normal curve 
distribution,, the mean and the median are at the same point. The 
majority of the scores ..are clustered ground t the mean/median. 
Sixty-eLght percent of the *sco£es are within Qne stahdard deviation of 
the me£in/median,. and- 95 percent are within 'two standard deviations. 
Scores which are more than three standard deviations from the 
mean/median are rather rare, occurring less than 1 percent of the time. 



Figure '3.2 



Comparison of Test Scores 




10 



20 



30 



40 50 60 
NCEs 



70 



80 



90 



10 20 30 ,40 50 60 70 80 90 
Pefcenliles . * 



99 



4 ' 5 — ! ~6" 
Slanines 



toy 



Uges * ^ 

Because of its well-documented statistical properties, the normal curve 
distribution is often' used in reporting test scores as an aid in 
interpreting scores of groups or individuals.. 



as an 



i 



Precautions 



The normal curve distribution is a statistical or mathematicar ide'at 
It is not a - graphic description of what| a particular distribution 
should be; distributions which do not conform to the normal c^rve aite 
"abnormal." Many f variables, can affeit the distribution of J$ 



not 



particular sat of scores: test content, dif ficulty/>f the test items, 
suitability of the test for the group to which if if administered. 



-72- t) 0 



7 



NORMAL CURVE EQUIVALENT SCORES (NCE) 
Definition 

- NCEs divide* the normal distribution into 99 segments,' units, or scores 
(Figure 4.2). Scores range from l-99, # with a mean/median of 50. NCEs 
* c^n be related 'to percentile , ranks as ' shown in the comparative scales 
in^gure 4.2. * ' 

* Uses . 

1. NCEs can be subjected to arithmetic operatiohs. Therefore, mean 
NCEi can be computed*, and differences in ^CEs can be compared at 

~^arl plaints in the score distribut^A,* 

2. NCEs can ije used in analyses b£^j|Foup data (for/ reasons above). 
In addition, NGIs are scaled to reveal small/^hanges,* something 
which st anine scores will not do consistently because of the large 
v score range at' each ^ptahine point. 

Pij^ution(s) . 

t W — » 

IT Use^Jpf jjtts for evaluating individualized performance is to be 
done with'caution. A change of five NCE units on a test score is 

owit^in the error range for individuals on most standardized 
tests. However, since NCEs • give a false sense of precision—and 
hence of security—the careless test user could consider such a 

'change meaningful. * . 

2. NCEs are difficult to interpret, whefl^ presented alone. After an 
analysis has been performed dto the Jbasis pf NCEs, "results are 
often converted to some more \eadiw understandable scale like 
percentile ranks. ^ - *• * 

<• 

NORM- REFERENCED TEST (NRT) 
Definition 

A test designed to rank students -according to the number of test items 
answered correctly .e. , according to rawsc^e). Ranking is usually 
also done in * relation* to the performance of a norming sample. The 
California Achievement Tests is an exampl-e.of gn NRT. 



*In a strict statistical sense, it is probably incorrect to subject 
any test scores to arithmetic operations. However, NCEs, standard scores 
with an underlying normal distribution, raw* scores, and stanines come 
closer than any 'Other score scales to having equal-interval properties 
whi#i permit arithmetic operations. - 



-73- 69 

• ERIC C " 



... • L 



f 4 

Use 

Norm-referenced tests provide information about whi^ch s-tudents^ know the 
most about the content included on the test. 

* * 
Precaution(s) 



1. A good *NRT is designed to enable between 40-70 percent of the 
examinees to answer any given, item correctly. Mwn items are 
therefore too difficult for a majority of examinees \c get ^-ight. 

; Thid means that most NRTs are not very good tests of what an 
individual student knows (as opposed to cr iter/ cn-referenced r 
tests). Rather, they are measures of who knows the oost about the , 
test content. j 

2. NRTs often include only one or two question', which measure 
achievement of a given skill or objective. Information about 
student performance on a particular objective is, therefore, 



usually not very reliable. 



NORMS 



■Definition 



Statistics that describe the test performance of specified groups, such 
as students in a given grade, age range, type of community, etc. 



Use 



Norms provide a way of relating raw scores to a more meaningful score 
scale, such as percentile ranks, stanines , grade equivalents, or * a 
standard score, so- that Vt can be determined how a student performed 
relative to a "representative 11 sample .of students similar in some way. 



Precaution(s) 



1. Norming samples cannpt be perfectly representative of a large 
group of students. For most major standardized tests, publishers 
use sophisticated Sampling procedures to* determine the 'norming 
sample-. However, there will always Ye a small error factor. This 
means that caution must be used when comparing the scores from two 
different ^ests or even from two levels of the same test because 
' the levels may not have used the same group of students. The 
following is _an example of what might happen because of this. If 
the students in the norming sample for Test A, are brighter than 
those in the sample for Test B, the norms for tti|p£wo Jrests will 
n<\t be equivalent. A student who then takes both tests* will be 
likely to attain a lower percentile rank on Test A because he/she 
is being '.compared to a brighter, group of students on a test which 
has "more difficult" norms. 



2. Test publishers often provide norms for different times of the 
year such as fall, winter, and spring. However, they may- not have 
used a norming sample at all of these times, which means that, some 
of the norms are estimates. A test manual should be consulted to 
determine when a given tes t was normed . Est imated norms for any 
otheT^cime of year should be^viewed with caution. 

3. Test norms are not necessarily derived every year, and therefore 
some norms may be several years old. However, it is common 
practice ^ to compare current studerit performance on a given test 
with the performance of the national uprming sample. Caution must 
therefore, be exercised in interpreting the meaning of an 
individual's status. For example, a student who took a test in 
1978 and who achieve^ a percentile rank of 60 probably did not 

t scor^higher than 60l percent of the students taking the test .in 
1978 . Rather, the individual scored higher than 60 percent of the 
students iitefhe norming sample who took the test in the past, for 

example in 1970. * " \ \ 

* i 

4. The above consideratibns may weaken the usefulness of older 
norms. If changes have occurred in curricula, current students 
may be better prepared in some skills or subjects than were 
students in the norming sample, less well prepared, or simply 

differently prepared. Thus, comparisons of percentile ranks 
across years may be clouded by changing curricula.' 

5. Norms are derived so that half of the representative group is 
expected to be below average. This means that half of the group 
will be below grade level, below a percentile rank of 50 and below 
thef mean. Therefore, it is extremlly difficult to have all of the 
students in any large group perform above the average. 

PERCENTILE RANK (PR) 
Definition 

. . I 
The percentage of students in the norming sample Who scored at ot below 
a given score. For example, if a raw score of 30 has a percentile rank 
of 78, then 78 percent of the students in the norming sample scored at. 
or below 30 items correct. 

Use a 

PRs provide easily interpretable information about how ,a given 
students performance on a test compares to the performance of students^ 
in the norming sample. 



Precaution(s) 



/ 



A 



1. ' PRs should not be added or subtracted because' they are not an 
equal distance apart at ( all points . For example, Figure 3.2 
. clearly shows that an increase pf 10 points between percentile 
ranks ~45' and 55 ' is not the same" distance as, an increase of 1^ 
points between percent ile^ranks- 85 and 95. A person would have to 
show a larger amount of improvement to achieve the second increase. 



-75- 



71 



J 

2. On a test of fewet^ than 100 questions, it is not possible for 
.every whole number of the percentile rank scale to have an 
associated raw score.* Therefore, in such circumstances , a 
one-point increase in raw score ca!i cause an r increase of several 
percentile rank * units. What might appear to, be substantial 
increase on the percentile rank scale is really only an increase 
of one additional question correct. This caveat apples to 
virtually all tests in standardized, batteries. 

3. Percentile^ranks, should not be confused with percent of correct 
answers (raw\ scores) . They have completely different meanings. ^~ 

''RAW SCORE 

Definition / ^ 

1 

The number of questions or test items answered correctly. 
Use 

Raw scores can be used to report the ndmber of questions answered 
correctly. 

Precaution(s) , 

1. A raw score has no meaning other than the number of items answered 
correctly. It providesno interpretative information. 

2. Raw scores can be quite misleading when reported by themselves 
because the meaning of raw scores differs from test to test. For 
example, if one 50-item t'est is easy and one 50-item test is 
difficult, a raw score of 30 on the difficult t$st might represent 
better performance than a raw score of 45 on the easier test. 

3. Subjecting taw^scores to arithmetic operations (ie., addition, 
etc.) is a questionable procedure. Generally raw scores do not 
have the equal inte^al property require! for these operations.^ 

^^this is because the same raw scorfc can be obtained^by different 
Students who get different combinations of items correct. These 

. items will mo-st likely vary in their level of difficulty. Thus, 
iden tical raw scores wi 11 possibly represent differential levels 
of achievement. 



RELIABILITY 

Def in,i tion 



Reliability refers to the extent" to which a test is jconsistent in what 
it measures!* There- are three major types of reliability, all expressed 
as a ^coefficient ranging fro^ 0 (complete lack of consistency) to 1 
(pierffect consistency-)^ 



\, Internal consistency is the degree to which all the questions on a 
test measure the same thing. For example, - a mathematics test that 
measures only addition of fractions will probably have a higher 
Internal consistency coefficient than one that measures several 
different* mathematical operations. * This would be especially 
important for achievement tests that measure specific skills. 

2. Stability is the degree to which a person will achieve the same 
score on a test that is taken twice- within a* time period of 
anything from a few days to a year or two. This is important in 
aiT^strumeA: which measures a trait like natural ability which is 
not expectafl tp> change over time. 

3. Equivalence is the decree to which a person will achieve the same 
score on two/-forms of the same test. This is important for any 
test in which two forms are to be used "interchangeably. 

t V 

Use 

Reliability's a measure of the quality of a test. 

Precaution(s) 1 

m 

The type of reliability appropriate for a given testing situation 
should be used. t ^ 

SCALE SCORE (SS) 

Definition ' .* , 

Scale Scores range from 0 to 999 and provide a link between all levels 
of the California Achievement Tests. 

\ ' * 

Uses 

' 1. Scale scores can be sub jected_to_&r.i thmetic operations like Norn^al 
Curve Equivalent scores. Therefore, means can be computed and 
differences in S5s can be compared meaningfully. 

2. Scale scores provide a way of comparing scores on different levels 
of the California .Achievement Tests and, therefore, provide a way 
of measuring growth. t ' 

3. The capability of comparing results from different test levels 
also means that scale scores ffelp to make, out-of-level testing 
possible*; This testing procedure allows for a student to take a 
test* for a grade other than his own and still have results 
(percentile ranks and stanines) leased 'on norms for his grade. 

- I 



ERIC 



-77- 

"\ 73 



/ 



Precaution ' 

1. Scale scores should not be used to compare scores in different 
sub-jedt areas. They were not developed so that equivalent scores < 
in two subject areas would indicate equivalent levels of 
achievement. Any comparison of scale scores should be done within 
subject areas. 

* 

2. There are not "typical" scale scores for each grade z or test 
. Ba^el. In fact, the ranges of SSs in the various levels overlap 

considerably. 

^ i 
SIGN TEST S 




k test of statistical significance which 7 is based on the number of 
increases (+) and decreases (-) in a set of comparisons. If the 
pattern of pluses and minuses deviates substantially from an * even 
lit, the pattern is considered significant. 




To* determine if a pattern of increases and 1 decreases deviates from an 
even split enough to indicate *a ^significant trend. 

Precaution 

' 1- The sign test indicates only if the overall trend of increases and 
decreases is significant. It does not provide any information as 
to whether individual increases or decreases are significant. 

2. r The ^size of a difference is irrelevant. For example, this -test 
does not differentiate between an increase of 1 point or 30 
points. They both- simply count as a plus. 

* * 

STANDARD* DEVIATION (SD) f * 

Definition 

© 

A measure of the dispersion in a set of scores. The closer the scores 
cluster around the'mean, the smaller the SD will be." 

Use 

As a measure of the spread" in a set of scores, the SD cart be used t<? 
assist in ' detenaining the degree of importance of score/differenced. 
For example, a ^difference of 2 points would probably not have niuch 
meaning if the SD were 20 but could be quite important if the SJ5 were 
.0.5. 

Precaution(s) 

/ 

None 



" 78 - 71-1 



STANDARD ERROR OF MEASUREMENT (SEM) 
Definition ( 

The SEM is an estimate of the magnitude of error in a test score. 
Possible causes of error in scores include lucky or unlucky guesses, a 
student's not feeling well or failing to follow directions, the fact 
that test questions may be only a sample of those that could be asked, 
sloppmess, laziness, etp. ** 

Use * N 



1. The SEM provides a way of datermi-ning the possible fluctuation in 
test scores which would be obtained if an individual were to take 
the same test a number, of times. It indicates how far a 

r particular obtained score might deviate from the individual's 
. "true" score (the score the individual would obtain if there were 
no error \in the test). It is usually assumed that the scores 
obtained fjrom repeated testing would conform to the normal curve 
distribution. Therefore, in practice, it is assumed that there is 
a probability^ 68:100 that the "true" score is within one SEM of 
the-obtained score and that there is a probability of! 95:100 that 
'the obtained score is within two SEMs of the obtained score. 

2. The SEM can be used in signif icance^tes ting to provide a way of 
determining whether differences iji test scores - or group mean 
scores are statistically significant (that they vary more than can 
be reasonably attributed to testing error). 

Precau tion(s) 



None 



STANINE 



Definition * * -j 

A stanine is one of the scores of a nine-point division of the normal 
distribution.* Stanine scores range from 1 to 9 with a mean and median 
of 5. As shown in Figure 3.2, each stanine has a range of 
corresponding percentile ranks or raw scores. 

Use V . - 



1. % Stanines can be - subjected to arithmetic operations (addition, 
etc.). Therefore, the mean of distributions can be computed, and 
differences in stanine scores can be compared at; all points in the 
distribution, except in some cases, ""&t the extreme stanine scores 
»of ,1 an^ 9. v 



ERJC 



-79- 



2. Stanines do not give a false sense of- accuracy of a given score 
because each stanine covers a range of raw scores. The staaine 
scale is therefore useful for reporting individuals' scores. 
Differences in stanines are more likely to represent change beyond 
that which' can be attributed to error than are other kinds of 
m* scores. 

Precaution(s) 



As can be seen in Figure 3.2, interpretation of differences in stanine 
scores is clouded by the range within a given stanine. For example, if 
.an 'individual's" score increases from the top of the Stanine-3 range to 
the bot torn of the Stanine-5 range, it represents less improvement than 
an increase from the bottom of the Stanine-3 range to the top of the 
Stanine-4 range. However, on cursory examination it would seem as if 
the first increase were the greater. 



STATISTICAL SIGNIFICANCE TEST 

Definition f 



IutN*^ us 



A significance test is a statistical procedure used to determine if two 
(or more) groups differ on a trait more than could normally be expected 
if testing error or sampling error were assumed to be the cause of the 
difference. 

Use f 

I 

Under highly controlled conditions (as in experiments, etc.), tests of 
statistical significance are used to test hypotheses. When variables 
cannot be controlled (as in the countywide testing program), the 
results from such a test are open to question. 

\ 

Precautions ) • ^ 

# 

1. Results of significance tests are reported as probability 
statements. If the reported probability is less than .01, the 
chance is less than 1 : 100 that the difference between grpups can 
be attributed to testing error. If the probability is .001, the 
chance is less than 1:1000 that the difference can be attributed 

N I to testing error. However, there is always some chance ( 1 : 1000, 
etc.) tlrat v the difference was caused by error. 

2. When a large number of tests of significance are performed, some, 
differences will turn out to be statistically significant by 
chance alone. That is, since there is always some chance that a 
^difference can be caused by error (1:20, 1:100, 1:1000, etc.), a 
certain number of significant differences can be expected to occur 
because of error. Th^re is "no way to determine if a particular 
statistically significant difference was or was not^ caused by 
error. Again, only a probability can be determined. 



( 

+ _ - 

3. When -tests of significance are used to evaluate the difference of 
means, the laVger the group the smaller the difference in means 
needs to be for statistical significance; The smaller the group 
the larger the difference must be. For example, a difference oj^m, 
only 1-2 months on 4 the grade equivalent scale, or a fraction of a 
raw-score point, ^will be statistically significant for groups of 
several thousand students.' In contrast, a difference of as much 
as six months may be required for significance with a group of one 
hundred students , > Because many of the comparisons in this report 
involve very large groups, 'no significant tests of differences and 
means were performed. While small . differences would have been 
statistically significant, they would not have been educationally 
meaningful . 

VALIDITY - 
Definition 

The extent to -which a test does' the job for which it is used. There 
are three major, types of validity that a test may possess. 

1- Content validity is most important for achievement tests. This 
requires a test to contain questions that adequately reflect \he 
content the test is supposed to measure. * 

2- Criterion-related validity is most important for placement tests, 
college admissions tests, or' tests on which employment decisions 
are based. Performance* on the test must be highly correlated with 
performance in the program, success in college, or success on the 
job for which the test is a screening instrument. 

3. Construct validity is most important in^Sychulogical instruments. 

Tests of ability are examples of such instruments- Construct 

validity* requires that the test adequately discriminate between. 

people who do or do not have a particular trait. 

y 

Use 

Validity is a measure ^or concept that helps one evaluate the quality of 
a test- * u 

Precaution(s) * 

The type of validity appropriate for a given^ testing situation should 
be used. — * 



-81- 



77 



\ 



